Introduction

Introduction to Reinforcement Learning

Reinforcement Learning Terminology:

Definition: All goals can be framed as the maximization of the expected cumulative reward.

Goal is to select actions to maximize the total future reward. Actions may affect not only the immediate reward but also the future rewards. The agent learns to achieve a balance between immediate and future rewards.

Definition: The history is the sequence of observations, actions, rewards, and states.

Definition: The state is a function of the history.

Full observability: If the agent's sensors give it access to the complete state of the environment, then the environment is said to be fully observable. Ot=Sta=SteO_t = S_t^a = S_t^e

Partial observability: If the agent's sensors give it access to only a partial state of the environment, then the environment is said to be partially observable.

A model predicts what the environment will do next. It is a simulation of the environment. The model can be used for planning and learning.

Transitions: P\mathcal{P} predicts the next state.

Rewards: R\mathcal{R} predicts the next immediate reward.

Categorizing RL Agents

Exploration vs. Exploitation

Prediction vs. Control


#MMI706 - Reinforcement Learning at METU